Chhattisgarhi Raw Speech Corpus

Chhattisgarhi Raw Speech Corpus

0 reviews requests (1)
Catalogue Number: 1436
Stock In Stock

OverView

Dataset Description:  138:09:27 Hours | 88.9 GB | 140 Speakers | 359 Audio Segments | 48 kHz | 16 bit wav LDC-IL has taken a positive step in its approach towards the mother tongues spoken in India, which is an i...
Please Login to see the price

Dataset Description

Dataset Description:

  138:09:27 Hours | 88.9 GB | 140 Speakers | 359 Audio Segments | 48 kHz | 16 bit wav 


LDC-IL has taken a positive step in its approach towards the mother tongues spoken in India, which is an indication of greater efforts to support and promote linguistic variety in the nation. Collection of Chhattisgarhi speech data is a major effort in this approach. This step towards developing language technology for Indian mother tongues will contribute to the overall enrichment and empowerment of mother tongues.

The Chhattisgarhi raw speech corpus is made up of recordings of native Chhattisgarhi speakers from various parts of the state of Chhattisgarh, and it represents a wide range of Chhattisgarhi varieties as they are spoken in various locations by diverse speakers. Each speaker from various age groups recites prompt text extracts of literary and news texts. Along with this, Spontaneous Speech has also been collected.

A detailed explanation of the Chhattisgarhi Raw Speech Corpus will be available in the Chhattisgarhi Raw Speech Data Documentation. 


For any research-based citations, please use the following citations: 


1.   Satyaendra Kumar Awasthi, Ankita Tiwari, Narayan Kumar Choudhary. 2023.  Chhattisgarhi Raw Speech Corpus. Central Institute of Indian Languages, Mysore.

2. Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview”  in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore.  pp. 160-174.

3.    Choudhary, N. 2021. LDC-IL: The Indian Repository of Resources for Language Technology. Language Resources & Evaluation. Springer, Vol. 55, Issue 1. doi: https://doi.org/10.1007/s10579-020-09523-3

Item specifics

  • Authors Satyaendra Kumar Awasthi, Ankita Tiwari, Shantanu Kumar, Rupesh Pandey, Saurabh Varik, Rajesha N., Manasa G., Srikanth D., Nithin S., Narayan Kumar Choudhary, Shailendra Mohan
  • Corpus Type Raw Speech Corpus
  • Catalogue Number 1436
  • ISBN 978-81-19411-78-8
  • Data Source On Field
  • Duration 138:09:27
  • # of Audio Segments 359
  • Release Date 8-Jan-24
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review